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1.  SUMMARY 

Recent  technological  advances  in  sensor  manufacturing  enable 
the  use  of  separate  spectral  bands;  e.g.,  MWIR  and  LWIR,  to 
generate  spatially  registered  imagery.  Human  factors 
experiments  can  be  used  to  test  whether  a sensor  can  improve 
operator  performance  for  detecting  or  recognizing  a target1. 
Although  human  factors  experiments  are  of  tremendous  value, 
these  tests  are  time  consuming  and  resource  intensive.  In 
order  to  reduce  costs  associated  with  collecting  behavioral 
data,  an  alternative  approach  is  discussed.  We  propose  using 
signal  detection  theory,  to  compliment  and  reduce  the  amount 
of  classical  human  performance  testing.  As  a test  case  we 
have  studied  whether  multi-spectral  sensors  are  significantly 
better  than  single  band  sensors. 

Scribner,  Satyshur,  and  Kruer  (1993)  demonstrated  that  a two- 
dimensional  matched  filter  (spatial)  optimized  for  a specific 
target  and  background  power  spectra,  can  be  used  to  estimate 
an  observer’s  ability  to  detect  the  target  embedded  in  a 
cluttered  background.  Three  different  background  images 
were  used  with,  and  without,  a target  present.  False  alarm  and 
target  detection  probabilities  were  computed  and  results  were 
plotted  on  a Receiver  Operating  Characteristic  (ROC)  curve. 
The  matched  filter  ROC  curves  were  then  compared  to 
behavioral  ROC  curves.  Results  showed  that  the  matched 
filter  ROC  curves  were  similar  to  behavioral  ROC  curves  with 
color  fusion  and  long-wave  infrared  showing  the  highest 
sensitivity  and  mid-wave  and  short-wave  infrared  scenes  were 
significantly  less  sensitive.  These  results  indicate  that  the 
matched  filter  analysis  may  be  used  to  model  human  behavior. 

Keywords:  Signal  Detection  Theory,  Matched  Filter 
Analysis,  Receiver  Operating  Characteristic,  Human 
Performance  Modeling,  Target  Detection 

2.  INTRODUCTION 

Military  applications  require  the  use  of  various  sensors  to 
determine  operational  threats  and  opportunities.  The 
combination  of  such  sensors  promise  to  provide  an  account  of 
the  opposition  that  is  superior  to  those  of  individual  sensors 
that  operate  at  particular  wavebands.  It  is  desirable  to  choose 
the  optimal  types  and  combinations  of  sensor  information  that 
are  maximally  responsive  to  target  types  likely  to  be 
encountered  in  the  field.  This  assessment  must  be  done  under 
realistic  physical  and  psychophysical  circumstances. 
Furthermore,  it  is  desirable  that  the  information  obtained  be 
modeled  productively,  i.e.  so  that  experimental  results  can  be 
interpolated  and  extrapolated  near  the  conditions  under  which 
they  are  obtained. 
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This  study  will  modify  an  existing  matched  filter  model  to  fit 
meaningful  human  performance  metrics  that  can  be  revised 
and  extended  where  necessary  to  represent  the  data  obtained 
during  field  tests.  This  model  will  be  used  to  evaluate  fused 
imagery  systems  requirements  and  performance.  Furthermore, 


this  model  will  indicate  what  type  of  data  will  be  needed  to 
validate  the  type  of  sensor  fusion  data  to  be  collected  in  the 
future. 

In  order  to  assess  an  operator's  ability  to  detect  a target  while 
viewing  sensor  imagery,  different  disciplines  have  developed 
methodologies  to  measure  operator  performance.  The  Night 
Vision  and  Electronic  Sensors  Directorate  (NVESD)  has 
developed  analytical  models  to  predict  target  detection  ranges 
for  sensors  that  operate  in  the  visible  and  infrared  bands3,4' 
These  electro-optical  models  provide  an  adequate  prediction 
of  a user's  ability  to  detect  a target  at  any  given  range.  In 
order  to  improve  the  validity  of  the  models,  atmospheric 
conditions,  sensor  characteristics,  target  characteristics, 
clutter,  estimated  time  that  an  operator  searches  for  the  target, 
and  an  assortment  of  other  parameters  are  used  to  model 
human  performance.  Currently,  these  models  are  limited  to 
single-band  sensors;  however,  the  next  generation  models  may 
incorporate  multi-spectral  sensor  performance. 

Recent  technological  advances  in  the  design  and 
manufacturing  of  multi-spectral  sensors  now  allows  spatially 
registered  imagery  to  be  mapped  to  a high  speed  processor 
where  it  can  be  fused  and  displayed  to  an  end  user5.  Within 
the  last  several  years,  numerous  groups  have  developed  sensor 
fusion  algorithms2,6  10  that  may  improve  operator 
performance.  These  techniques  may  differ  on  the  algorithm 
approach,  but  they  ail  have  the  same  objective:  improving  the 
image  quality  for  the  observer.  Several  behavioral  studies1, 

16  and  image  quality  studies2,10  have  tried  to  quantify  the 
benefits  of  sensor  fusion,  but  the  results  were  inconsistent. 

This  is  not  surprising  considering  that  in  many  cases  different 
spectral  bands  were  used  and  a number  of  other  parameters 
varied  as  well,  such  as  camera  sensitivity,  and  target  and 
background  characteristics. 

Tanner  and  Swets  (1954)  proposed  that  statistical  decision 
theory  may  be  used  to  predict  operators  decision  behavior. 
Signal  Detection  Theory  is  a common  technique  used  by 
vision  scientist  to  measure  subjects'  sensitivity  and  response 
bias  to  a set  of  stimuli  . Whether  target  detection  is 
accomplished  through  the  human  visual  system  or  by  means  of 
a matched  filter,  the  theory  of  signal  detection  requires 
recognizing  a signal  plus  noise  from  a steady  state  noise 
background.  Vision  scientists  use  signal  detection  theory  to 
measure  operator  performance  to  an  assortment  of  stimuli. 
Similarly,  an  image-processing  algorithm  may  use  a matched 
filter  technique  that  is  based  on  signal  detection  theory  to 
predict  operators'  performance  through  a sensor.  Ideally,  the 
Receiver  Operating  Characteristic  (ROC)  plots  derived  from 
both  methodologies  should  yield  similar  results.  The 
advantage  of  the  matched  filter  technique  allows  the  system 
engineer  to  conduct  multiple  simulations  for  a wide  variety  of 
backgrounds  and  target  types.  These  simulations  require 
minimal  resources  compared  to  costly  human  performance 
field  tests. 


Paper  presented  at  the  RTO  SCI  Workshop  on  “ Search  and  Target  Acquisition”,  held  in  Utrecht, 
The  Netherlands,  21-23  June  1999,  and  published  in  RTO  MP-45. 
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A matched  filter  is  a two-dimensional  (2-D)  array,  which  has 
been  optimized  to  maximize  the  signal-to-noisc  ratio  and 

provide  a measure  of  the  spatial  correlation  between  the  input 
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image  and  the  reference  image  . The  resulting  filters  arc 
“tuned”  to  negate  the  effects  of  the  background  clutter  and 
other  noise  sources  in  the  image.  A matched  filter  is  the 
optimum  linear  filter  for  the  detection  of  the  target.  Scribner 
et  al.  (1993)  used  a matched  filter  on  long-wave  infrared  (9.0 
to  1 1.6  pm),  medium-wave  infrared  (4.5  to  5.5  pm),  and  short- 
wave infrared  (2.0  to  2.6  pm)  sensors,  as  well  as  on  a fused 
single  image  of  these  bands.  In  this  approach,  spatial-only  and 
spatial-spectral  matched  filters  were  derived  for  the  three 
infrared  images  and  the  fused  composite  image  respectively. 
The  intent  of  these  matched  filters  w'as  to  simulate  the 
detection  ability  and  sensitivity  of  the  human  visual  system. 
Although  the  matched  filter  is  commonly  used  within  the 
physics  and  engineering  communities  to  quantify  sensor 
performance,  it  has  been  used  to  some  extent  by  the  medical 
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field  to  detect  tumors  in  a x-ray  image  . 

The  objective  of  this  paper  is  to  compare  and  contrast 
behavioral  and  matched  filter  ROC  plots  to  determine  whether 
the  matched  filler  technique  is  a good  predictor  of  human 
performance.  The  advantages  of  the  matched  filter  model  arc 
threefold.  First,  it  provides  a sensor  image  fusion  metric  that 
can  be  used  to  evaluate  different  sensors.  Second,  it  quantifies 
the  degree  of  “enhancement”  achieved  by  a fusion  process, 
thus  allowing  for  direct  comparisons  of  the  various  sensor 
fusion  algorithms.  Third,  it  may  have  the  ability  to  predict 
human  visual  performance  across  a variety  of  background  and 
target  conditions. 

A standard  visual  search  paradigm  will  be  conducted  for  three 
different  multi-spectral  natural  scenes.  In  experiment  1, 
behavioral  ROC  plots  will  be  compared  to  the  matched  filter 
ROC  plots  to  determine  whether  the  matched  filter  technique 
accurately  predicts  observers'  sensitivity.  In  experiment  2,  eye 
movement  data  will  be  recorded  to  determine  whether 
observers'  scan  pattern  correlates  with  the  ROC  analysis.  It  is 
hypothesized  that  observers  viewing  a low  contrast  stimulus 
will  exhibit  longer  saccade  lengths  and  shorter  fixations  as 
well  as  show  a low'  sensitivity  for  detecting  the  target  (i.e..  less 
correct  responses  and  more  errors). 


3.  EXPERIMENT  1 
3.1.  Behavioral  Test 

Subjects:  Fourteen  male  military'  officers  (mean  age  3 1 .7 
years  old)  participated  in  this  visual  search  study.  All  subjects 
had  normal  (20/20),  or  corrected  to  normal,  acuity  and  color 
vision.  Subjects  were  naive  to  the  purpose  of  the  experiment 
and  none  had  participated  in  previous  visual  search 
experiments.  All  subjects  signed  an  informed  consent  and 

were  briefed  on  the  ethical  conduct  for  subject  participation  in 
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the  Protection  of  Human  Subjects  . 

Apparatus:  Stimuli  were  presented  by  a VisionWorks 
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computer  graphics  system”  on  an  IDEK  MF-8521  high- 
resolution  color  monitor  (21"  X 20"  of  viewable  area,  ,28mm 
dot  pitch)  equipped  with  a non-glare,  anti-rcflect,  P-22 
phosphor.  The  monitor’s  resolution  was  800  by  600  pixels 
(x=75.02  and  y=74.92  pixels/dcgrec).  98.9  Hz  frame-rate, 
mean  chromatically  of  Y=  50.2,  x = 0.334,  y = 0.336  (193 1 
CIE),  and  a maximum  luminance  of  100  cd/nr.  Luminance  of 
the  monitor  was  linearized  by  means  of  an  8-bit  look-up  table 
for  each  of  the  red,  green,  and  blue  guns.  Subjects  viewed  the 
monitor  from  1.5  meters  and  were  positioned  by  an  adjustable 


chinrcst.  Subjects  viewed  the  stimuli  under  mcsopic 
conditions. 

Stimuli:  Three  single-band  stimuli  (short-,  mid-,  and  long- 
wave infrared)  and  two  composite  stimuli  (fuscd-color  and 
fused-grav)  w'ere  selected  from  a multispectral  natural  scene 
database.  The  selection  criteria  consisted  of  scenes  that 
contained  heterogeneous  terrain  characteristics  and  no  man- 
made targets  (figure  1 ). 


target 


(C)  (d) 

Figure  1.  Single-band  (a)  short-wave  infrared,  (b) 
mid-wave  infrared,  (c)  long-wave  infrared,  and  (d) 
fused  color  was  created  by  taking  principle 
component  direction  of  correlated  thermal  and 
visible  pixel  values  as  the  luminance  direction  in  a 
transformed  space2’.  The  airplane  target  is  located 
in  upper  right  quadrant. 

Each  background  scene  was  320  by  400  pixels  (subtended 
8.54°  by  7.24°  visual  angle)  with  50  percent  of  the  stimuli 
containing  a randomly  placed  airplane  (subtended  0.1°  by  0.1° 
visual  angle).  For  each  scene,  the  airplane  target  spectral 
characteristics  was  based  on  a measured  target  within  the 
multispectral  database.  The  long-wave  target  pixel  spectral 
values  were  255,  mid-wave  target  pixel  values  were  73,  and 
short-wave  target  pixel  values  were  1 14.  The  fused  color 
scene  spectral  values  were  red=255.  grccn=73,  and  blue=l  14. 
The  achromatic  fused  images  were  spatially  identical  to  the 
chromatic  fused  images;  however  the  achromatic  condition 
was  employed  to  control  for  luminance  effects.  For  each 
background  scene,  the  target  w'as  present  in  50  trials.  The 
target  location  was  generated  by  a random  number  generator 
and  then  inserted  at  that  particular  location.  The  target 
placement  for  each  of  the  50  locations  was  identical  across  the 
different  background  types. 

The  composite  stimuli  approach  is  to  assign  each  pixel  a color 
vector  defined  by  the  detected  power  in  the  registered  three- 
2 

band  imagery  . Scatterplots  (figure  2)  of  the  image  ensemble 
of  colors  frequently  reveal  pronounced  anti-correlation 
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between  short  and  long  wavelengths,  consistent  with  Kirchoffs 
law  (reflective  objects  which  appear  bright  in  the  short-wave 
infrared  typically  have  low  emissivity  and  appear  dark  in  the 
long-wave  infrared).  For  a given  registered  short-  and  long- 
wave infrared  image  pair,  the  principal  component 
corresponds  to  the  major  axis-luminance  channel.  The 
orthogonal  axis  corresponds  to  the  minor  axis-color  channel. 
The  assignment  of  luminous  intensities  to  the  correlated 
component  is  straightforward,  but  the  assignment  of  color  to 
the  uncorrelated  features  is  not  immediately  obvious.  The 
assignment  of  a pixel  color  is  based  on  color  opponency.  By 
a-priori  assigning  one  color  to  the  image  intensified  ( i 2)  and 
it's  color  opponent  to  the  infrared  (ir),  the  resulting  display 
shows  two  and  only  two  opponent  colors  of  various  saturation. 
This  makes  an  immediately  intuitive  representation  as  to 
which  spectral  bands  dominant  and  by  how  much.  It  must  be 
strongly  emphasized  that  this  system  is  mathematically 
incomplete  to  allow  the  perception  of  actual  visible  colors  in 
the  estimated  reflectivity  sense.  Distinction  between  various 
vegetation,  soil  types,  structures,  water,  and  sky  is  based  on 
coincident  phenomenology  in  each  spectral  region,  not  by 
estimating  a physical  property  such  as  emissivity. 
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Figure  2.  Color  fusion  algorithm  technique.  Two 
highly  correlated  bands  will  have  a cigar-shaped 
distribution.  The  principal  component  direction 
(L,’)  is  the  luminance  channel  and  the  orthogonal 
axis  (L2’)  is  the  chromatic  channel.  Increasing 
color  contrast  (dotted  line)  while  retaining  the 
luminance  characteristics  is  achieved  by  re-scaling 
L)\  In  an  actual  sensor  system,  the  principal 
component  direction  is  based  on  the  statistics  of 
the  scene  (determined  adaptively). 

Procedure:  Each  subject  participated  in  only  one  display 
format.  Subjects  were  instructed  to  manually  respond  on  a 
keyboard  whether  a target  was  present  or  absent  within  the 
scene.  The  four  response  categories  were  “1”  = definitely  no 
target  to  “4”  = definitely  a target.  At  the  beginning  of  each 
trial,  the  subject  fixated  on  a cross  hair  located  in  the  center  of 
the  screen.  The  fixation  cross  was  presented  for  200 
millisecond,  immediately  followed  by  a 1000  millisecond 
presentation  of  the  experimental  stimulus.  The  stimulus 
extinguished  after  the  initial  presentation  or  after  the  subject 
made  a response,  whichever  came  first.  The  next  trial  began 
approximately  1 second  after  the  subject’s  preceding  response. 
Accuracy  was  measured  for  each  trial  and  no  feedback  was 
given  for  incorrect  responses. 


3.2.  Matched  Filter  Analysis 

In  general  the  matched  filter  analysis  paralleled  behavioral 
testing  described  above.  That  is  the  same  images  and  targets 
were  used  to  generate  numerical  results.  One  additional  step  in 
the  matched  filter  processing  was  to  blur  the  image  and  target 
very  slightly  to  take  into  account  the  modulation  transfer 
function  of  the  display  and  the  human  visual  system.  This  was 
done  using  a narrow  gaussian  point  spread  function  with  a 
radius  of  one  pixel.  The  actual  computations  were  done  using 
MATLAB™  software,  which  manipulates  the  image  data  in  a 
matrix  format.  Single-band  filters  were  derived  using  the 
smoothed  2-D  power  spectrum  of  the  background  and  the 
target  template  with  the  target  intensity  identical  to  that  used 
in  the  behavioral  testing.  The  3-D  spatio-spectral  (color)  filter 
was  derived  by  considering  the  target  and  the  background  as  a 
3-D  space,  with  the  third  dimension  being  the  spectral  values 
of  each  pixel.  In  either  case  a multidimensional  matched  filter 
can  be  derived  in  the  frequency  domain  using  the  expression, 

w(k 

where  S is  the  multidimensional  signal  representation,  W is 
the  multidimensional  power  spectral  density.  The  spatial 
frequencies  A*  and  ky  are  image  coordinates  indices  in  the 
frequency  domain,  and  kf  is  the  spectral  band  index.  The  real 
space  filter  can  be  found  by  computing  the  inverse  3-D 
Fourier  transform  of  H , 

h(x,y,X)  = F^{H{k„kr,k,)) 

Processing  the  image  with  each  respective  filter  is  then  done 
by  convolving  the  filter  with  several  hundred  locations  in  the 
image.  This  is  accomplished  by  multiplying  filter  values  times 
corresponding  pixel  values  aligned  at  each  location.  The 
summed  values  are  stored  for  each  position,  giving  an 
indication  of  false  alarms  (clutter  leakage  noise).  These  values 
are  then  compared  to  a second  set  of  calculated  totals 
produced  by  the  same  procedure,  but  with  the  target  inserted  at 
corresponding  locations  giving  an  indication  of  target 
detection.  By  comparing  the  signal-plus-noise  values  to  the 
noise  values  for  a given  threshold  value,  false  alarm  and  target 
detection  probabilities  can  be  calculated  and  displayed  in  the 
form  of  an  empirical  ROC  plot. 

3.3.  Results 

Signal  detection  theory  distinguishes  operator  performance 
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into  two  categories  - sensitivity  and  response  criterion  or  (3  . 
Sensitivity  is  defined  as  the  difference  between  the  means  of 
the  signal  plus  noise  and  noise  distributions.  An  observer's 
response  criterion  is  independent  of  sensitivity.  To  calculate 
an  observer's  response  criterion,  (3  is  equal  to  the  ordinate  of 
the  signal  plus  noise  distribution  at  criterion  divided  by  the 
ordinate  of  noise  distribution  at  criterion. 

Both  sensitivity  and  response  criterion  is  derived  from  the 
probability  of  hits  and  probability  of  false  alarms  for  each 
experimental  condition.  A ROC  plot  is  a useful  illustration  of 
the  relationship  between  sensitivity  and  response  bias.  The 
ROC  curve  plots  on  a single  graph  the  joint  value  of 


f exp (~ikr0). 
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Figure  3 (left  column).  Human  performance 
receiver  operating  characteristic  (ROC)  plots  for 
fourteen  subjects.  Each  format  condition  (a)  short- 
wave infrared,  (b)  mid-wave  infrared,  (c)  long- 
wave infrared,  (d)  monochrome  fusion,  and  (e) 
color  fusion  had  three  subjects  except  for  the  gray 
fused  condition  had  two  subjects.  Subjects  within 
the  fused  color  and  long-wave  infrared  conditions 
had  the  highest  sensitivity  for  detecting  the  target, 
while  short-wave  infrared  and  fused  color  near 
chance  (d’=0). 


Matched  Filter  Results 


Figure  4.  Matched  filter  ROC  plot  for  the  five 
different  format  types.  Short-wave  infrared  = 
diamonds,  mid-wave  infrared  = squares,  long-wave 
infrared  = triangles,  fused-color  = circles,  fused 
monochrome  = stars.  Although  the  ROC 
sensitivities  between  each  format  are  not 
quantitatively  identical,  the  matched  filter 
technique  gives  excellent  qualitative  agreement 
with  the  human  performance  tests. 


probability  of  hits  and  probability  of  false  alarms  for  each 
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tested  condition"  . 

For  this  analysis,  behavioral  and  matched  filter  ROC  plots 
were  compared  across  the  five  formats.  Figure  3 illustrates  the 
behavioral  ROC  plots  across  format  types  (blue  = short-wave 
infrared:  green  = mid-wave  infrared:  red  = long-wave  infrared; 
fused  gray  = monochrome  fused:  and  fused  color  = color 
fusion).  The  fused  color  and  long-wave  infrared  formats  had 
the  highest  sensitivity,  while  the  short-wave  infrared  and  fused 
gray  sensor  formats  were  near  chance.  The  mid-wave  sensor 
sensitivity  was  between  the  short-  and  long-wave  sensor 
formats. 

Figure  4 illustrates  the  matched  filter  ROC  plot  across  format 
types.  Again,  the  sensitivities  show  similar  trends  across 
format  types.  Moreover,  the  sensitivities  between  the  matched 
filter  and  behavioral  ROC  plots  are  very  similar.  Therefore, 
the  matched  filter  may  be  a viable  alternative  to  human 
performance  testing  to  assess  operator  detection  performance. 


16-5 


4.  EXPERIMENT  2 

4.1.  Introduction 

Cognitive  scientists  record  eye  movements  to  understand 

cognitive  processes  that  occur  when  an  observer  is  searching 
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for  a target  . Eye  movement  data  illustrates  where  and  when 
the  eye  fixates  within  the  scene;  however,  the  data  does  not 
indicate  what  was  processed.  Rayner  (1978)  found  that  our 
eyes  move  three  to  four  times  per  second  while  searching  a 
scene.  These  saccadic  eye  movements  enable  the  observer  to 
extract  important  high  spatial  detail  from  each  foveal  fixation. 
Although  there  has  been  numerous  eye  movement  studies 
investigating  visual  cognition,  it  is  unclear  what  mechanisms 
control  where  and  what  the  eye  will  fixate  on  next. 

Biederman,  Mezzanotte,  and  Rabinowitz  (1982)  found 
subjects  extract  information  outside  the  fovea  during  scene 
perception.  The  parafovea  and  peripheral  vision  may  extract 
certain  features  within  a fixation;  however,  this  information 
may  or  may  not  be  identified.  In  order  to  integrate  these 

parafovea  cues  into  an  identifiable  object,  a fixation  is 
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required  . To  facilitate  object  identification,  the  more 

informative  the  scene  the  more  likely  the  observer  will  fixate 
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on  those  recognizable  regions  . 

On  the  initial  fixation,  the  observer  will  obtain  a global 
snapshot  of  the  scene.  Next,  low-level  visual  cues  such  as 
color,  brightness,  and  contours  will  guide  the  observer’s  eye 
movements.  Therefore,  the  level  of  informativeness  within 
the  picture  will  influence  subjects’  scene  comprehension  and 
object  identification.  It  is  hypothesized  that  a color-fused 
scene  contains  more  informative  features  about  the  signal-to- 
noise  ratio  then  an  achromatic  scene.  Subjects  initial  eye 
movements  will  be  guided  to  a color-fused  target  due  to  the 
target  attributes.  The  achromatic  target  attributes  will  not 
contain  enough  informative  information  to  capture  the 
observers’  visual  attention.  Furthermore,  the  eye  movement 
results  will  correlate  with  the  ROC  plots.  Targets  embedded 
within  the  short-  and  mid-wave  infrared  scenes  will  require 
more  saccades  to  identify  the  object,  while  the  long-wave  and 
fused  conditions  will  be  identified  within  the  first  couple 
fixations. 

4.2.  Methods 

Subjects:  Ten  male  military  officers  participated  in  this  eye 
movement  study.  All  subjects  had  normal  (20/20),  or 
corrected  to  normal,  acuity  and  color  vision.  Subjects  were 
naive  to  the  purpose  of  the  experiment  and  none  had 
participated  in  previous  visual  search  experiments.  All 
subjects  signed  an  informed  consent  and  were  briefed  on  the 
ethical  conduct  for  subject  participation  in  the  Protection  of 
Human  Subjects20. 

Apparatus:  Eye  movements  were  recorded  using  an  ISCAN, 
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Inc.  remote  eye  imaging  system  . The  eye  tracker  is  a video- 
based  system  that  uses  an  infrared  camera  to  illuminate  the 
eye  and  another  camera  to  record  the  pupil  to  corneal 
reflection.  The  eye  tracker  then  calculates  the  difference 
between  the  pupil  and  corneal  reflection  to  indicate  where  the 
observer  is  fixating  on  the  stimulus  screen.  The  system 
operates  at  a sample  rate  of  60  Hz  and  the  subject’s  visual 
point-of-regard  may  be  determined  with  an  accuracy  of  better 
than  one  degree  over  a +/-  25  degree  horizontal  to  a +/-  20 
degree  vertical  range. 

Stimuli:  Same  as  experiment  1. 


Figure  5a.  Subject  CH  was  not  able  to  identify  the 
short-wave  infrared  target.  The  subject's  initial 
fixation  provides  enough  global  information  about 
where  to  search  within  the  scene,  but  the  low  target 
contrast  does  not  have  enough  information  to 
attract  the  visual  system.  This  result  correlates 
with  the  low  sensitivity  for  the  visual  search  task. 
Subjects'  sensitivity  was  near  chance  (d'=0). 


Figure  5b.  Subject  JL  found  the  target  after  the 
first  fixation.  The  color-fused  target  contained 
enough  visual  information  to  automatically  guide 
the  subject  to  the  target  location.  Thus,  the  color- 
fused  target  had  a high  level  of  informativeness, 
which  enabled  the  subject  to  identify  the  target 
with  little  effort.  Again,  this  result  correlates  with 
the  high  sensitivity  measure  within  the  visual 
search  experiment. 

Procedure:  Each  subject  participated  in  only  one  display 
format.  At  the  beginning  of  the  experimental  session,  the 
subject’s  head  was  placed  in  a chinrest  positioned  1 meter 
from  the  stimulus  monitor.  The  subject’s  right  eye  was  then 
calibrated  using  a five-point  calibration  grid  displayed  on  the 
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stimulus  monitor.  To  maintain  an  accurate  calibration 
between  the  eye  tracker  and  the  stimulus  monitor,  periodic 
five-point  calibration  checks  were  conducted  throughout  the 
experimental  session. 

At  the  beginning  of  each  trial,  the  subject  fixated  in  the  center 
of  the  screen.  Once  the  subject's  eye  was  in  the  desired 
location,  the  experimenter  initiated  the  trial.  A fixation  cross 
was  presented  for  200  milliseconds,  immediately  followed  bv 
a 5-second  presentation  of  the  experimental  stimulus. 

Subjects  were  instructed  to  search  for  the  target.  Once  the 
target  was  identified,  the  subject  was  to  maintain  fixation  on 
the  target  until  the  stimulus  extinguishes.  There  were  32 
target  trials  and  16  noise  trials  presented  for  each  format. 
Subjects'  point-of-regard  was  recorded  at  601  Iz.  No  feedback 
on  target  identification  accuracy  was  given. 

4.3.  Results 

In  each  trial,  the  eye  movement  recording  apparatus  recorded 
the  observer’s  fixation  point  at  the  rate  of  601  Iz.  and  hence  a 
total  of  300  data  points  were  obtained  per  5-second  trial.  An 
analysis  software  tool  was  subsequently  used  to  analyze  the 
data  with  the  criterion  for  minimum  fixation  time  at  40mscc 
and  the  maximum  horizontal  and  vertical  deviation  of  the  eyes 
at  ±5  and  ±3  pixels  respectively.  Thus,  the  number  of 
fixations,  the  duration  of  each  fixation,  and  the  distance 
between  fixations  could  be  determined.  These  data  were  then 
tabulated  to  calculate  the  mean  and  the  standard  error  mean 
values  of  the  fixation  duration,  number  of  fixations,  and  scan 
path  length. 

Subjects  within  the  short-  and  mid-wave  infrared  and  gray 
fused  conditions  showed  more  fixations  and  longer  scan-path 
lengths  compared  to  the  long-wave  infrared  and  color-fused 
conditions.  The  long-wave  and  color-fused  targets  contained 
enough  informative  attributes  to  guide  the  subjects’  eye 
movements  to  the  desired  location.  Figure  5 illustrates  a 
subject’s  search  for  a short-wave  infrared  and  an  another 
subject’s  search  for  a color  fused  target.  The  subject 
immediately  identified  the  color-fused  target,  while  the  other 
subject  was  not  able  to  find  the  short-wave  infrared  target. 

The  subject  within  the  short-wave  infrared  scene  obtained 
enough  global  information  within  the  initial  fixation  to  search 
higher  probability  areas  as  to  where  the  target  may  be  located. 
However,  the  target’s  poor  contrast  inhibited  the  subject  from 
identifying  the  location.  Alternatively,  the  fuscd-color 
condition  provided  enough  informative  information  within  the 
first  fixation  to  guide  the  subject  to  the  target's  location.  The 
target’s  good  spatial  characteristics  and  large  color  contrast 
easily  guided  the  subject  to  the  appropriate  critical  region. 

These  results  parallel  the  ROC  results.  The  short-  and  mid- 
wave  infrared  and  gray  fused  conditions  low  sensitivities 
match  the  eye  scan  data.  Subjects  within  these  conditions 
were  not  able  to  find  the  target  within  the  first  couple  fixations 
which  would  indicate  that  their  sensitivity  should  be  low. 
Subjects  within  the  color-fused  and  long-wave  infrared 
conditions  easily  identified  the  target,  which  would  indicate 
that  their  sensitivity  should  be  high. 


5.  CONCLUSION 

The  purpose  of  this  experiment  was  to  compare  matched  filter 
analysis  with  human  behavioral  signal  detection.  The  matched 
filter  results  illustrate  that  the  different  sensor  format 
sensitivities  arc  similar  to  the  behavioral  sensitivities. 

Although  the  ROC  sensitivities  between  each  format  arc  not 


quantitatively  identical,  the  matched  filter  technique  gives 
excellent  qualitative  agreement  with  the  human  performance 
tests.  Additional  refinement  of  the  matched  filter  should  result 
in  even  better  agreement.  Ogawa  ( 1 997)  found  that  the 
matched  filter  ROC  was  consistently  superior  to  the 
behavioral  ROC.  His  matched  filter  did  not  account  for  the 
human  visual  system  inequalities.  The  gaussian  blur  was 
added  to  the  filter  to  more  accurately  represent  the  human 
visual  system  resolution  limit.  The  addition  of  gaussian  blur 
to  the  filter  caused  our  results  to  behave  more  similar  to 
behavioral  ROC  as  compared  to  Ogawa’s  results'  . Additional 
refinement  of  the  exact  amount  of  gaussian  blur  to  the 
matched  filter  should  improve  the  correlation  between  the  two 
ROC  plots.  The  eye  movement  results  illustrate  that  the  eye 
was  not  able  to  identify  the  short-  and  mid-wave  infrared  and 
gray  fused  conditions  as  well  as  the  color  and  long-wave 
infrared  conditions.  The  color  and  long-wave  infrared  targets 
possessed  important  visual  attributes  that  enabled  the  subject 
to  identify  the  target  with  little  to  no  effort.  A surprising 
finding  was  the  poor  performance  of  the  gray  fused  condition 
for  both  the  signal  detection  and  eye  scan  experiments. 
Subjects  guided  search  for  the  target  was  not  solely  dependent 
upon  spatial  content:  rather,  visual  search  was  mediated  by 
both  spatial  and  color  target  attributes.  This  finding  indicates 
that  color  fusion  is  more  appropriate  for  targeting  applications 
than  monochrome  fusion.  The  color-fused  target  "pops-out"  at 
the  subject,  which  allows  increased  signal-to-noise  sensitivity. 

In  summary,  the  matched  filter  technique  may  be  a useful 
technique  to  predict  human  visual  sensitivity  for  different 
sensor  types  by  target  characteristics.  The  matched  filter 
technique  will  assist  system  engineers  with  a rough 
approximation  of  a human  sensitivity  to  a target.  This 
information  could  then  be  used  for  rapid  prototyping  of  a 
system,  enhance  the  predictability  of  existing  electro-optical 
models,  and  provide  a metric  to  test  multi-spectral  sensors. 
Additional  tests  will  need  to  be  conducted  to  test  the 
robustness  of  the  matched  filter  across  different  signal-to- 
noise  ratios,  terrain  and  target  types,  and  various  other 
atmospheric  and  illumination  conditions.  Finally,  this 
matched  filter  will  assist  human  factors  testing  by  reducing  the 
number  of  parameters  needed  to  achieve  the  desired  goal. 
Human  factors  testing  will  always  be  required,  but  at  least  the 
matched  filter  technique  may  provide  the  human  factors  group 
a better  understanding  of  how  the  human  will  respond  in  the 
field. 
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